The deployment of robots in uncontrolled environments requires them to operate robustly under previously unseen scenarios, like irregular terrain and wind conditions. Unfortunately, while rigorous safety frameworks from robust optimal control theory scale poorly to high-dimensional nonlinear dynamics, control policies computed by more tractable "deep" methods lack guarantees and tend to exhibit little robustness to uncertain operating conditions. This work introduces a novel approach enabling scalable synthesis of robust safety-preserving controllers for robotic systems with general nonlinear dynamics subject to bounded modeling error by combining game-theoretic safety analysis with adversarial reinforcement learning in simulation. Following a soft actor-critic scheme, a safety-seeking fallback policy is co-trained with an adversarial "disturbance" agent that aims to invoke the worst-case realization of model error and training-to-deployment discrepancy allowed by the designer's uncertainty. While the learned control policy does not intrinsically guarantee safety, it is used to construct a real-time safety filter (or shield) with robust safety guarantees based on forward reachability rollouts. This shield can be used in conjunction with a safety-agnostic control policy, precluding any task-driven actions that could result in loss of safety. We evaluate our learning-based safety approach in a 5D race car simulator, compare the learned safety policy to the numerically obtained optimal solution, and empirically validate the robust safety guarantee of our proposed safety shield against worst-case model discrepancy.
translated by 谷歌翻译
语言模型既展示了定量的改进,又展示了新的定性功能,随着规模的增加。尽管它们具有潜在的变革性影响,但这些新能力的特征却很差。为了为未来的研究提供信息,为破坏性的新模型能力做准备,并改善社会有害的效果,至关重要的是,我们必须了解目前和近乎未来的能力和语言模型的局限性。为了应对这一挑战,我们介绍了超越模仿游戏基准(Big Bench)。 Big Bench目前由204个任务组成,由132家机构的442位作者贡献。任务主题是多样的,从语言学,儿童发展,数学,常识性推理,生物学,物理学,社会偏见,软件开发等等。 Big-Bench专注于被认为超出当前语言模型的功能的任务。我们评估了OpenAI的GPT型号,Google内部密集变压器体系结构和大型基础上的开关稀疏变压器的行为,跨越了数百万到数十亿个参数。此外,一个人类专家评估者团队执行了所有任务,以提供强大的基准。研究结果包括:模型性能和校准都随规模改善,但绝对的术语(以及与评估者的性能相比);在模型类中的性能非常相似,尽管带有稀疏性。逐渐和预测的任务通常涉及大量知识或记忆成分,而在临界规模上表现出“突破性”行为的任务通常涉及多个步骤或组成部分或脆性指标;社交偏见通常会随着含糊不清的环境而随着规模而增加,但这可以通过提示来改善。
translated by 谷歌翻译
安全是自主系统的关键组成部分,仍然是现实世界中要使用的基于学习的政策的挑战。特别是,由于不安全的行为,使用强化学习学习的政策通常无法推广到新的环境。在本文中,我们提出了SIM到LAB到实验室,以弥合现实差距,并提供概率保证的安全意见政策分配。为了提高安全性,我们采用双重政策设置,其中通过累积任务奖励对绩效政策进行培训,并通过根据汉密尔顿 - 雅各布(Hamilton-Jacobi)(HJ)达到可达性分析来培训备用(安全)政策。在SIM到LAB转移中,我们采用监督控制方案来掩盖探索过程中不安全的行动;在实验室到实验室的转移中,我们利用大约正确的(PAC) - 贝斯框架来提供有关在看不见环境中政策的预期性能和安全性的下限。此外,从HJ可达性分析继承,界限说明了每个环境中最坏情况安全性的期望。我们从经验上研究了两种类型的室内环境中的自我视频导航框架,具有不同程度的光真实性。我们还通过具有四足机器人的真实室内空间中的硬件实验来证明强大的概括性能。有关补充材料,请参见https://sites.google.com/princeton.edu/sim-to-lab-to-real。
translated by 谷歌翻译
Reach-避免最佳控制问题,其中系统必须在保持某些目标条件的同时保持清晰的不可接受的故障模式,是自主机器人系统的安全和活力保证的核心,但它们的确切解决方案是复杂的动态和环境的难以解决。最近的钢筋学习方法的成功与绩效目标大致解决最佳控制问题,使其应用​​于认证问题有吸引力;然而,加固学习中使用的拉格朗日型客观不适合编码时间逻辑要求。最近的工作表明,在将加强学习机械扩展到安全型问题时,其目标不是总和,但随着时间的推移最小(或最大)。在这项工作中,我们概括了加强学习制定,以处理覆盖范围的所有最佳控制问题。我们推出了一个时间折扣 - 避免了收缩映射属性的贝尔曼备份,并证明了所得达到避免Q学习算法在类似条件下会聚到传统的拉格朗郎类型问题,从而避免任意紧凑的保守近似值放。我们进一步证明了这种配方利用深度加强学习方法,通过将近似解决方案视为模型预测监督控制框架中的不受信任的oracles来保持零违规保证。我们评估我们在一系列非线性系统上的提出框架,验证了对分析和数值解决方案的结果,并通过Monte Carlo仿真在以前的棘手问题中。我们的结果为一系列基于学习的自治行为开放了大门,具有机器人和自动化的应用。有关代码和补充材料,请参阅https://github.com/saferoboticslab/safett_rl。
translated by 谷歌翻译
安全关键型应用程序要求控制器/政策能够保证安全高度信心。如果我们可以访问地面真实的系统动态,控制屏障功能是一种有用的工具,可以保证安全。在实践中,我们对系统动态的知识不准确,这可能导致不安全的行为导致的残余动力学。使用确定性机器学习模型学习剩余动态可以防止不安全的行为,但是当预测不完美时可能会失败。在这种情况下,概率学习方法,其预测的不确定性的原因可以有助于提供强大的安全利润。在这项工作中,我们使用高斯过程来模拟残余动力学的投影到控制屏障功能上。我们提出了一种新颖的优化程序,以产生安全控制,可以保证具有高概率的安全性。安全滤波器具有推理来自GP预测的不确定性的能力。我们通过SEGWAY和四轮车模拟的实验展示了这种方法的功效。与具有神经网络的确定性方法相比,我们所提出的概率方法能够显着降低安全违规的数量。
translated by 谷歌翻译
我们研究了覆盖的阶段 - 避免多个代理的动态游戏,其中多个代理相互作用,并且每种希望满足不同的目标条件,同时避免失败状态。 Reach-避免游戏通常用于表达移动机器人运动计划中发现的安全关键最优控制问题。虽然这些运动计划问题存在各种方法,但我们专注于找到时间一致的解决方案,其中计划未来的运动仍然是最佳的,尽管先前的次优行动。虽然摘要,时间一致性封装了一个非常理想的财产:即使机器人早期从计划发出的机器人的运动发散,即,由于例如内在的动态不确定性或外在环境干扰,即使机器人的运动分歧,时间一致的运动计划也保持最佳。我们的主要贡献是一种计算 - 避免多种代理的算法算法,避免呈现时间一致的解决方案。我们展示了我们在两位和三位玩家模拟驾驶场景中的方法,其中我们的方法为所有代理商提供了安全控制策略。
translated by 谷歌翻译
Array programming provides a powerful, compact, expressive syntax for accessing, manipulating, and operating on data in vectors, matrices, and higher-dimensional arrays [1]. NumPy is the primary array programming library for the Python language [2,3,4,5]. It plays an essential role in research analysis pipelines in fields as diverse as physics, chemistry, astronomy, geoscience, biology, psychology, material science, engineering, finance, and economics. For example, in astronomy, NumPy was an important part of the software stack used in the discovery of gravitational waves [6] and the first imaging of a black hole [7].Here we show how a few fundamental array concepts lead to a simple and powerful programming paradigm for organizing, exploring, and analyzing scientific data. NumPy is the foundation upon which the entire scientific Python universe is constructed. It is so pervasive that several projects, targeting audiences with specialized needs, have developed their own NumPy-like interfaces and array objects. Because of its central position in the ecosystem, NumPy increasingly plays the role of an interoperability layer between these new array computation libraries.
translated by 谷歌翻译
Most benchmarks for studying surgical interventions focus on a specific challenge instead of leveraging the intrinsic complementarity among different tasks. In this work, we present a new experimental framework towards holistic surgical scene understanding. First, we introduce the Phase, Step, Instrument, and Atomic Visual Action recognition (PSI-AVA) Dataset. PSI-AVA includes annotations for both long-term (Phase and Step recognition) and short-term reasoning (Instrument detection and novel Atomic Action recognition) in robot-assisted radical prostatectomy videos. Second, we present Transformers for Action, Phase, Instrument, and steps Recognition (TAPIR) as a strong baseline for surgical scene understanding. TAPIR leverages our dataset's multi-level annotations as it benefits from the learned representation on the instrument detection task to improve its classification capacity. Our experimental results in both PSI-AVA and other publicly available databases demonstrate the adequacy of our framework to spur future research on holistic surgical scene understanding.
translated by 谷歌翻译
Motion prediction systems aim to capture the future behavior of traffic scenarios enabling autonomous vehicles to perform safe and efficient planning. The evolution of these scenarios is highly uncertain and depends on the interactions of agents with static and dynamic objects in the scene. GNN-based approaches have recently gained attention as they are well suited to naturally model these interactions. However, one of the main challenges that remains unexplored is how to address the complexity and opacity of these models in order to deal with the transparency requirements for autonomous driving systems, which includes aspects such as interpretability and explainability. In this work, we aim to improve the explainability of motion prediction systems by using different approaches. First, we propose a new Explainable Heterogeneous Graph-based Policy (XHGP) model based on an heterograph representation of the traffic scene and lane-graph traversals, which learns interaction behaviors using object-level and type-level attention. This learned attention provides information about the most important agents and interactions in the scene. Second, we explore this same idea with the explanations provided by GNNExplainer. Third, we apply counterfactual reasoning to provide explanations of selected individual scenarios by exploring the sensitivity of the trained model to changes made to the input data, i.e., masking some elements of the scene, modifying trajectories, and adding or removing dynamic agents. The explainability analysis provided in this paper is a first step towards more transparent and reliable motion prediction systems, important from the perspective of the user, developers and regulatory agencies. The code to reproduce this work is publicly available at https://github.com/sancarlim/Explainable-MP/tree/v1.1.
translated by 谷歌翻译
Calibration is a popular framework to evaluate whether a classifier knows when it does not know - i.e., its predictive probabilities are a good indication of how likely a prediction is to be correct. Correctness is commonly estimated against the human majority class. Recently, calibration to human majority has been measured on tasks where humans inherently disagree about which class applies. We show that measuring calibration to human majority given inherent disagreements is theoretically problematic, demonstrate this empirically on the ChaosNLI dataset, and derive several instance-level measures of calibration that capture key statistical properties of human judgements - class frequency, ranking and entropy.
translated by 谷歌翻译